Morphosyntactic annotation of CHILDES transcripts.
نویسندگان
چکیده
Corpora of child language are essential for research in child language acquisition and psycholinguistics. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe a project whose goal is to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. We have produced a corpus of over 18,800 utterances (approximately 65,000 words) with manually curated gold-standard grammatical relation annotations. Using this corpus, we have developed a highly accurate data-driven parser for the English CHILDES data, which we used to automatically annotate the remainder of the English section of CHILDES. We have also extended the parser to Spanish, and are currently working on supporting more languages. The parser and the manually and automatically annotated data are freely available for research purposes.
منابع مشابه
Parsing Hebrew CHILDES transcripts
We present a syntactic parser of (transcripts of) spoken Hebrew: a dependency parser of the Hebrew CHILDES database. CHILDES is a corpus of child–adult linguistic interactions. Its Hebrew section has recently been morphologically analyzed and disambiguated, paving the way for syntactic annotation. This paper describes a novel annotation scheme of dependency relations reflecting constructions of...
متن کاملHigh-accuracy Annotation and Parsing of CHILDES Transcripts
Corpora of child language are essential for psycholinguistic research. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe an ongoing project that aims to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. To d...
متن کاملAdding Syntactic Annotations to Transcripts of Parent-Child Dialogs
We describe an annotation scheme for syntactic information in the CHILDES database (MacWhinney, 2000), which contains several megabytes of transcribed dialogs between parents and children. The annotation scheme is based on grammatical relations (GRs) that are composed of bilexical dependencies (between a head and a dependent) labeled with the name of the relation involving the two words (such a...
متن کاملDependency Annotation for Learner Corpora
Building from the CHILDES dependency annotation scheme and on interlanguage POS annotation, we describe a syntactic annotation scheme developed for the data of second language learners. We encode subcategorization frames and underlying dependencies, in addition to the usual surface dependencies. The annotation scheme is relatively independent of language and can be mapped to learner errors.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of child language
دوره 37 3 شماره
صفحات -
تاریخ انتشار 2010